本文介绍了牧羊犬:一个简单但有效的模型,用于持续学习语言如何在视野中接地。考虑到预先训练的多模式嵌入模型,其中语言和图像在同一语义空间中投影(在这种情况下通过Openai剪辑),牧灵学习一个转换功能,在需要时调整语言嵌入式以适应新的语言使用。与传统的少量学习不同,该模型不仅仅是学习新的类和标签,还可以概括与类似的语言使用。我们验证了模型对两种不同的持续学习任务的性能,并表明它可以仅从几个例子中有效地学习和概括,几乎没有干扰模型的原始零射击性能。
translated by 谷歌翻译
The material science literature contains up-to-date and comprehensive scientific knowledge of materials. However, their content is unstructured and diverse, resulting in a significant gap in providing sufficient information for material design and synthesis. To this end, we used natural language processing (NLP) and computer vision (CV) techniques based on convolutional neural networks (CNN) to discover valuable experimental-based information about nanomaterials and synthesis methods in energy-material-related publications. Our first system, TextMaster, extracts opinions from texts and classifies them into challenges and opportunities, achieving 94% and 92% accuracy, respectively. Our second system, GraphMaster, realizes data extraction of tables and figures from publications with 98.3\% classification accuracy and 4.3% data extraction mean square error. Our results show that these systems could assess the suitability of materials for a certain application by evaluation of synthesis insights and case analysis with detailed references. This work offers a fresh perspective on mining knowledge from scientific literature, providing a wide swatch to accelerate nanomaterial research through CNN.
translated by 谷歌翻译
Recently, the use of synthetic training data has been on the rise as it offers correctly labelled datasets at a lower cost. The downside of this technique is that the so-called domain gap between the real target images and synthetic training data leads to a decrease in performance. In this paper, we attempt to provide a holistic overview of how to use synthetic data for object detection. We analyse aspects of generating the data as well as techniques used to train the models. We do so by devising a number of experiments, training models on the Dataset of Industrial Metal Objects (DIMO). This dataset contains both real and synthetic images. The synthetic part has different subsets that are either exact synthetic copies of the real data or are copies with certain aspects randomised. This allows us to analyse what types of variation are good for synthetic training data and which aspects should be modelled to closely match the target data. Furthermore, we investigate what types of training techniques are beneficial towards generalisation to real data, and how to use them. Additionally, we analyse how real images can be leveraged when training on synthetic images. All these experiments are validated on real data and benchmarked to models trained on real data. The results offer a number of interesting takeaways that can serve as basic guidelines for using synthetic data for object detection. Code to reproduce results is available at https://github.com/EDM-Research/DIMO_ObjectDetection.
translated by 谷歌翻译
Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs) to deterministically noise the image to the intermediate state along the path that the denoising would follow given the original conditioning. However, DDIM inversion for real images is unstable as it relies on local linearization assumptions, which result in the propagation of errors, leading to incorrect image reconstruction and loss of content. To alleviate these problems, we propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers. EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors which are used to invert each other in an alternating fashion. Using Stable Diffusion, a state-of-the-art latent diffusion model, we demonstrate that EDICT successfully reconstructs real images with high fidelity. On complex image datasets like MS-COCO, EDICT reconstruction significantly outperforms DDIM, improving the mean square error of reconstruction by a factor of two. Using noise vectors inverted from real images, EDICT enables a wide range of image edits--from local and global semantic edits to image stylization--while maintaining fidelity to the original image structure. EDICT requires no model training/finetuning, prompt tuning, or extra data and can be combined with any pretrained DDM. Code is available at https://github.com/salesforce/EDICT.
translated by 谷歌翻译
使用相对比心脏磁共振成像(PC-CMR)进行的流量分析可以量化用于评估心血管功能的重要参数。该分析的重要部分是鉴定正确的CMR视图和质量控制(QC),以检测可能影响流量定量的伪像。我们提出了一个新型的基于深度学习的框架,用于对完整CMR扫描的流量进行完全自动化的分析,该框架首先使用两个顺序卷积神经网络进行这些视图选择和QC步骤,然后进行自动主动脉和肺动脉分段,以实现对量化的量化。钥匙流参数。对于观察分类和QC,获得了0.958和0.914的精度值。对于细分,骰子分数为$> $ 0.969,而平淡的altman情节表示手动和自动峰流量值之间的一致性很高。此外,我们在外部验证数据集上测试了管道,结果表明管道的鲁棒性。这项工作是使用由986例病例组成的多生临床数据进行的,表明在临床环境中使用该管道的潜力。
translated by 谷歌翻译
当不可用的数据不可用时,在电子商务行业中通常使用强盗算法来培训机器学习(ML)系统。但是,行业的设置提出了各种挑战,使实践中实施强盗算法的挑战是非平凡的。在本文中,我们详细阐述了非政策优化,延迟奖励,概念漂移,奖励设计和业务规则限制的挑战。我们的主要贡献是对开放匪徒(OBP)框架的扩展。我们为一些上述挑战提供模拟组件,以使未来的从业者,研究人员和教育工作者提供资源,以应对电子商务行业遇到的挑战。
translated by 谷歌翻译
在计算机视觉中,在评估深度学习模型中的潜在人口偏见方面具有重要的研究兴趣。这种偏见的主要原因之一是训练数据中的失衡。在医学成像中,偏见的潜在影响可以说要大得多,因此兴趣较小。在医学成像管道中,对感兴趣的结构的分割在估计随后用于告知患者管理的临床生物标志物方面起着重要作用。卷积神经网络(CNN)开始用于自动化此过程。我们介绍了训练集失衡对种族和性别偏见在基于CNN的细分中的影响的首次系统研究。我们专注于从短轴Cine Cine心脏磁共振图像中对心脏结构进行分割,并训练具有不同种族/性别不平衡水平的CNN分割模型。我们发现性实验没有明显的偏见,但是在两个单独的种族实验中有明显的偏见,强调需要考虑健康数据集中不同人口组的足够代表。
translated by 谷歌翻译
由于结构化数据通常不足,因此在开发用于临床信息检索和决策支持系统模型时,需要从电子健康记录中的自由文本中提取标签。临床文本中最重要的上下文特性之一是否定,这表明没有发现。我们旨在通过比较荷兰临床注释中的三种否定检测方法来改善标签的大规模提取。我们使用Erasmus医疗中心荷兰临床语料库比较了基于ContextD的基于规则的方法,即使用MEDCAT和(Fineted)基于Roberta的模型的BilstM模型。我们发现,Bilstm和Roberta模型都在F1得分,精度和召回方面始终优于基于规则的模型。此外,我们将每个模型的分类错误系统地分类,这些错误可用于进一步改善特定应用程序的模型性能。在性能方面,将三个模型结合起来并不有益。我们得出的结论是,尤其是基于Bilstm和Roberta的模型在检测临床否定方面非常准确,但是最终,根据手头的用例,这三种方法最终都可以可行。
translated by 谷歌翻译
主动推论为自主代理人的行为和学习提供了一个一般框架。它指出,代理商将尝试最大程度地减少其变异自由能,这是根据观察,内部状态和政策的信念定义的。传统上,必须手动指定离散主动推理模型的每个方面,即手动定义隐藏的状态空间结构以及所需的分布,例如可能性和过渡概率。最近,已经努力从使用深神经网络的观察结果自动学习状态空间表示。但是,这些模型通常被过度参数化,并可能过度拟合手头的数据。在本文中,我们提出了一种使用量子物理启发的张量网络的学习状态空间的新方法。张量网络代表量子状态的概率性质以及减少大状态空间的能力使张量网络成为自然推断的自然候选者。我们展示了如何将张量网络用作顺序数据的生成模型。此外,我们展示了如何从这种生成模型中获得信念,以及主动推理剂如何使用这些信念来计算预期的自由能。最后,我们演示了有关经典T迷宫环境的方法。
translated by 谷歌翻译
从多个相机角度捕获事件可以为观众提供该事件最完整,最有趣的图片。为了适合广播,人类导演需要决定在每个时间点显示什么。随着摄像头的数量,这可能会变得笨拙。全向或广角摄像机的引入使事件更加完整地捕获,这使导演更加困难。在本文中,提出了一个系统,即鉴于事件的多个超高分辨率视频流,可以生成视觉上令人愉悦的镜头序列,以遵循事件的相关动作。由于算法是通用的,因此可以应用于以人类为特征的大多数情况。当需要实时广播时,提出的方法允许在线处理,以及当优先级的相机操作质量时,离线处理。对象检测用于检测输入流中人类和其他感兴趣的对象。检测到的感兴趣的人以及基于电影惯例的一组规则,用于确定要显示哪个视频流以及该流的哪一部分实际上是构造的。用户可以提供许多确定这些规则如何解释的设置。该系统能够通过消除镜头扭曲来处理不同广角视频流的输入。对于多种不同的情况,使用用户研究表明,提议的自动导演能够以美学上令人愉悦的视频构图和类似人类的镜头切换行为来捕获事件。
translated by 谷歌翻译